DiscoverHuggingFace 每日AI论文速递2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场
2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场

2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场

Update: 2025-10-02
Share

Description

本期的 15 篇论文如下:

[00:19 ] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search(DeepSearch:以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈)

[01:20 ] 🤖 GEM: A Gym for Agentic LLMs(GEM:面向智能体大模型的开放训练场)

[01:57 ] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators(VLA-RFT:基于世界模拟器与验证奖励的视觉-语言-动作强化微调)

[02:36 ] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation(背包强化学习:通过优化预算分配解锁大模型探索潜能)

[03:06 ] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation(Code2Video:面向教育视频生成的代码中心范式)

[03:41 ] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning(PIPer:基于在线强化学习的设备端环境自动配置)

[04:11 ] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents(ACON:面向长程LLM智能体的上下文压缩优化)

[04:52 ] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls(为何Transformer学不会乘法?逆向工程揭示长程依赖陷阱)

[05:22 ] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses(BiasFreeBench:面向大语言模型去偏响应评测的统一基准)

[06:01 ] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution(Flash-Searcher:基于DAG并行执行的极速高效网络智能体)

[06:42 ] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration(BroRL:通过拓宽探索规模来扩展强化学习)

[07:25 ] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum(超越对数似然:面向模型能力连续谱的监督微调概率目标)

[08:02 ] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models(论大型语言模型强化学习动力学的可预测性)

[08:31 ] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness(GUI-KV:面向具备时空感知的高效GUI智能体的KV缓存方案)

[09:17 ] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned(训练视觉-语言过程奖励模型以实现多模态推理测试时扩展:关键洞见与经验总结)

<figure></figure>

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场

2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场